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Abstract 

An agent often has a number of hypotheses, and must choose among them based 
on observations, or outcomes of experiments. Each of these observations can be viewed 
as providing evidence for or against various hypotheses. All the attempts to formalize 
this intuition up to now have assumed that associated with each hypothesis h there is 
a likelihood function /ih, which is a probability measure that intuitively describes how 
likely each observation is, conditional on h being the correct hypothesis. We consider 
an extension of this framework where there is uncertainty as to which of a number 
of likelihood functions is appropriate, and discuss how one formal approach to defining 
evidence, which views evidence as a function from priors to posteriors, can be generalized 
to accommodate this uncertainty. 

1 Introduction 

Consider an agent trying to choose among a number of hypotheses: Is it the case that all 
ravens are black or not? Is a particular coin fair or double-headed? The standard picture 
in such situations is that the agent makes a number of observations, which give varying 
degrees of evidence for or against each of the hypotheses. The following simple example 
illustrates the situation. 

Example 1.1 Suppose that Alice and Bob each have a coin. Alice's coin is double-headed, 
Bob's coin is fair. Charlie knows all of this. Alice and Bob give their coin to some third 
party, Zoe, who chooses one of the coins, and tosses it. Charlie is not privy to Zoe's choice, 
but gets to see the outcome of the toss. Charlie is interested in two events (which are called 
hypotheses in this context): 

A: the coin is Alice's coin 

B: the coin is Bob's coin. 



*A preliminary version of this paper appeared in the Proceedings of the 21st Conference on Uncertainty 
in Artificial Intelligence, pp. 243-250, 2005. Most of this work was done while the second author was at 
Cornell University. 
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Now Charlie observes the coin land heads. What can he say about the probability of the 
events A and B? If Charlie has no prior probability on A and B, then he can draw no 
conclusions about their posterior probability; the probability of A could be any number in 
[0, 1]. The same remains true if the coin lands heads 100 times in a row. | 

Clearly Charlie learns something from seeing 100 (or even one) coin toss land heads. 
This has traditionally been modeled in terms of evidence: the more times Charlie sees 
heads, the more evidence he has for the coin being double-headed. A number of ways of 
have been proposed for modeling and quantifying evidence in the literature; see [Kyburg 
1983] for an overview. We do not want to enter the debate here as to which approach is 
best. Rather, we focus on a different problem regarding evidence, which seems not to have 
been considered before. 

All of the approaches to evidence considered in the literature make use of the likelihood 
function. More precisely, they assume that for each hypothesis h of interest, there is a 
probability /j>h (called a likelihood function) on the space of possible observations. In the 
example above, if the coin is tossed once, the two possible observations are heads and tails. 
Clearly /j,A(heads) = 1/2 and /isiheads) = 1. If the coin is tossed 100 times, then there 
are 2 100 possible observations (sequences of coin tosses). Again, ha and [iB put obvious 
probabilities on this space. In particular, if lOOheads is the observation of seeing 100 heads 
in a row, then fiA(lOOheads) = 1/2 100 and fiB(lOOheads) = 1. Most of the approaches 
compute the relative weight of evidence of a particular observation ob for two hypotheses 
A and B by comparing HA(ob) and //b(o&). 

However, in many situations of interest in practice, the hypothesis h does not determine 
a unique likehood function [ih- To understand the issues that arise, consider the following 
somewhat contrived variant of Example 1.1. 

Example 1.2 Suppose that Alice has two coins, one that is double-headed and one that is 
biased 3/4 towards heads, and chooses which one to give Zoe. Bob still has only one coin, 
which is fair. Again, Zoe chooses either Alice's coin or Bob's coin and tosses it. Charlie, 
who knows the whole setup, sees the coin land heads. What does this tell him about the 
likelihood that the coin tossed was Alice's? I 

The problem is that now we do not have a probability \xa on observations corresponding 
to the coin being Alice's coin, since Charlie does not know if Alice's coin is double-headed 
or biased 3/4 towards heads. It seems that there is an obvious solution to this problem. 
We simply split the hypothesis "the coin is Alice's coin" into two hypotheses: 

A\: the coin is Alice's coin and it is double-headed 

A2: the coin is Alice's coin and it is the biased coin. 

Now we can certainly apply standard techniques for computing evidence to the three hy- 
potheses A±, A2, and B. The question now is what do the answers tell us about the evidence 
in favor of the coin being Alice's coin? More generally, how should we model and quantify 
evidence when the likelihood functions themselves are uncertain? 
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While Example 1.2 is admittedly contrived, situations like it arise frequently in practice. 
For example, Epstein and Schneider [2005] show how multiple likelihoods can arise in invest- 
ment decisions in the stock market, and the impact they can have on hedging strategies. 1 
For another example, consider a robot equipped with an unreliable sensor for navigation. 
This sensor returns the distance to the wall in front of the robot, with some known error. 
For simplicity, suppose that distances are measured in integral units 0, 1, 2, ... , and that if 
the wall is at distance m, then the sensor will return a reading of m — 1 with probability 1/4, 
a reading of m with probability 1/2, and a reading of m + 1 with probability 1/4. Suppose 
the robot wants to stop if it is exactly close to the wall, where "close" is interpreted as being 
within 3 units of the wall, and go forward if it is farther than 3 units. So again, we have two 
hypotheses of interest. However, while for each specific distance m we have a probability 
fi m on sensor readings, we do not have a probability on sensor readings corresponding to 
the hypothesis far. "the robot is farther than 3 from the wall" . While standard techniques 
will certainly give us the weight of evidence of a particular sensor reading for the hypothesis 
"the robot is distance m from the wall" , it is not clear what the weight of evidence should 
be for the hypothesis far. 

We hope that these examples have convinced the reader that there is often likely to be 
uncertainty about likelihoods. Moreover, as we show by considering one particular definition 
of evidence, there are subtleties involved in defining evidence when there is uncertainty 
about likelihoods. Although we focus on only one way of defining evidence, we believe 
that these subtleties will arise no matter how evidence is represented, and that our general 
approach to dealing with the problem can be applied to other approaches (although we have 
not checked the details). 

The approach for determining the weight of evidence that we consider in this paper is 
due to Shafer [1982], and is a generalization of a method advocated by Good [1950]. The 
idea is to assign to every observation and hypothesis a number between and 1 — the weight 
of evidence for the hypothesis provided by the observation — that represents how much the 
observation supports the hypothesis. The closer a weight is to 1, the more the observation 
supports the hypothesis. This weight of evidence is computed using the likelihood functions 
described earlier. This way of computing the weight of evidence has several good properties, 
and is related to Shafer's theory of belief functions [Shafer 1976]; for instance, the theory 
gives a way to combine the weight of evidence from independent observations. We give 
full details in Section 2. For now, we illustrate how the problems described above manifest 
themselves in Shafer's setting. 

Let an evidence space 6 consist of a set TL of possible hypotheses, a set O of observations, 
and a probability \ih on observations for each h G H. We take the weight of evidence for 
hypothesis h provided by observation ob in evidence space £, denoted we(ob, h), to be 

w £ (ob,h) - 



T,h'en Vh'(ob)' 

It is easy to see that ws(ob,-) acts like a probability on H, in that ^2f ie ^ws(ob, h) = 1. 
With this definition, it is easy to compute the weight of evidence for Alice's coin when 

1 Epstein and Schneider present a general model of decision making in the presence of multiple likelihoods, 
although they do not attempt to quantify the evidence provided by observations in the presence of multiple 
likelihoods. 
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Charlie sees heads in Example 1.1 is 2/3, and the weight of evidence when Charlie sees 100 
heads is 2 100 /(2 100 + 1). As expected, the more often Charlie sees heads, the more evidence 
he has in favor of the coin being double-headed (provided that he does not see tails). 

In Example 1.2, if we consider the three hypotheses A±, A2, and B, then the weight of 
evidence for Ai when Charlie sees heads is 1/(1 + 3/4 + 1/2) = 4/9; similarly, the weight of 
evidence for A2 is 1/3 and the weight of evidence for B is 2/9. Since weight of evidence acts 
like a probability, it might then seem reasonable to take the weight of evidence for A (the 
coin used was Alice's coin) to be 4/9 + 1/3 = 7/9. (Indeed, this approach was implicitly 
suggested in our earlier paper [Halpern and Pucella 2006].) But is this reasonable? A first 
hint that it might not be is the observation that the weight of evidence for A is higher in 
this case than it is in the case where Alice certainly had a double-headed coin. 

To analyze this issue, we need an independent way of understanding what evidence is 
telling us. As observed by Halpern and Fagin [1992], weight of evidence can be viewed as 
a function from priors to posteriors. That is, given a prior on hypotheses, we can combine 
the prior with the weight of evidence to get the posterior. In particular, if there are two 
hypotheses, say Hi and H2, the weight of evidence for Hi is a, and the prior probability of 
Hi is 0, then the posterior probability of Hi (that is, the probability of Hi in light of the 
evidence) is 

a0 

a0 + {\- a)(l - 0)' 

Thus, for example, by deciding to perform an action when the weight of evidence for A is 
2/3 (i.e., after Charlie has seen the coin land heads once), Charlie is assured that, if the 
prior probability of A is at least .01, then the posterior probability of A is at least 2/11; 
similarly, after Charlie has seen 100 heads, if the prior probability of A is at least .01, then 
the posterior probability of A is at least 2 100 /(2 100 + 99). 

But now consider the situation in Example 1.2. Again, suppose that the prior prob- 
ability of A is at least .01. Can we conclude that the posterior probability of A is at 
least .01(7/9)/(. 01(7/9) + .99(2/9)) = 7/205? As we show, we cannot. The calculation 
(a(3)/(a(3 + (1 — a)(l — 0)) is appropriate only when there are two hypotheses. If the 
hypotheses A\ and A2 have priors «i and a 2 and weights of evidence (3i and 02, then the 
posterior probability of A is 

ai0i + CX202 

ai0i + a.202 + (1 — ai — a 2 )(l - 01 - 02) ' 
which is in general quite different from 

(Ol +«2)(/9l +02) 

(ai + a 2 )(0i + 02) + (1 - on ~ a 2 )(l ~0\~ 02) ' 

Moreover, it is easy to show that if 0i > 02 (as is the case here), then the posterior of A is 
somewhere in the interval 

O202 O101 

a 2 /? 2 + (1 - Q 2 )(l - 2 ) ' OL101 + (1 - Qi)(l -0i)_ ' 

That is, we get a lower bound on the posterior by acting as if the only possible hypotheses 
are A2 and B, and we get an upper bound by acting as if the only possible hypotheses are 
Ai and B. 
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In this paper, we generalize this observation by providing a general approach to dealing 
with weight of evidence when the likelihood function is unknown. In the special case when 
the likelihood function is known, our approach reduces to Shafer's approach. Roughly 
speaking, the idea is to consider all possible evidence spaces consistent with the information. 
The intuition is that one of them is the right one, but the agent trying to ascribe a weight 
of evidence does not know which. For example, in Example 1.2, the evidence space either 
involves hypotheses {A\, B} or hypotheses {A2, B}: either Alice's first coin is used or Alice's 
second coin is used. We can then compute the weight of evidence for Alice's coin being used 
with respect to each evidence space. This gives us a range of possible weights of evidence, 
which can be used for decision making in a way that seems most appropriate for the problem 
at hand (by considering the max, the min, or some other function of the range). 

The advantage of this approach is that it allows us to consider cases where there are 
correlations between the likelihood functions. For example, suppose that, in the robot 
example, the robot's sensor was manufactured at one of two factories. The sensors at 
factory 1 are more reliable than those of factory 2. Since the same sensor is used for all 
readings, the appropriate evidence space either uses all likelihood functions corresponding 
to factory 1 sensors, or all likelihood functions corresponding to factory 2 sensors. 

The rest of this paper is organized as follows. In Section 2, we review Shafer's approach 
to dealing with evidence. In Section 3, we show how to extend it so as to deal with situation 
where the likelihood function is uncertain, and argue that our approach is reasonable. In 
Section 4, we consider how to combine evidence in this setting. We conclude in Section 5. 
The proofs of our technical results are deferred to the appendix. 



2 Evidence: A Review 

We briefly review the notion of evidence and its formalization by Shafer [1982], using some 
terminology from [Halpern and Pucella 2005]. 

We start with a finite set TL of hypotheses, which we take to be mutually exclusive and 
exhaustive; thus, exactly one hypothesis holds at any given time. We also have a set O 
of observations, which can be understood as outcomes of experiments that can be made. 
Finally, we assume that for each hypothesis h £ TL, there is a probability fih (often called 
a likelihood function) on the observations in O. This is formalized as an evidence space 
£ = (Ti.,0,fj,), where TL and O are as above, and /x is a likelihood mapping, which assigns 
to every hypothesis h G TL a probability measure /J,(h) = Hh- (F° r simplicity, we often write 
fj,h for n{h), when the former is clear from context.) 

For an evidence space 6, the weight of evidence for hypothesis h 6 TL provided by 
observation ob, written ws(ob,h), is 

W£{ob,h) = — — . (1) 

Z2h'en »h'{ob) 

The weight of evidence wg provided by an observation ob with J2heH Phiob) = is left 
undefined by (1). Intuitively, this means that the observation ob is impossible. In the 
literature on evidence it is typically assumed that this case never arises. More precisely, it 
is assumed that all observations are possible, so that for every observation ob, there is an 



5 



hypothesis h such that Hh(ob) > 0. For simplicity, we make the same assumption here. (We 
remark that in some application domains this assumption holds because of the structure of 
the domain, without needing to be assumed explicitly; see [Halpern and Pucella 2005] for 
an example.) 

The measure wg always lies between and 1, with 1 indicating that the observation 
provides full evidence for the hypothesis. Moreover, for each fixed observation ob for which 
X^ftew / J, h(°b) > 0, YlheH w £ h) = 1> an d thus the weight of evidence wg looks like a 
probability measure for each ob. While this has some useful technical consequences, one 
should not interpret wg as a probability measure. It is simply a way to assign a weight to 
hypotheses given observations, and, as we shall soon see, can be seen as a way to update a 
prior probability on the hypotheses into a posterior probability on those hypotheses, based 
on the observations made. 

Example 2.1 In Example 1.1, the set Tt of hypotheses is {A, B}; the set O of observations 
is simply {heads, tails}, the possible outcomes of a coin toss. From the discussion following 
the description of the example, it follows that /x assigns the following likelihood functions 
to the hypotheses: since /u,A(heads) is the probability that the coin landed heads if it is 
Alice's coin (i.e., if it is double-headed), then fiA(heads) = 1 and ^A(tails) = 0. Similarly, 
Hb (heads) is the probability that the coin lands heads if it is fair, so hb (heads) = 1/2 and 
[j, B (tails) = 1/2. This can be summarized by the following table: 





A B 


heads 
tails 


1 1/2 
1/2 



Let 

£ = ({A, B}, {heads, tails}, fi). 

A straightforward computation shows that wg(heads,A) = 2/3 and wg(heads,B) = 1/3. 
Intuitively, the coin landing heads provides more evidence for the hypothesis A than the 
hypothesis B. Similarly, w (tails, A) = and w(tails,A) = 1. Thus, the coin landing tail 
indicates that the coin must be fair. This information can be represented by the following 
table: 



Wg 


A B 


heads 
tails 


2/3 1/3 
1 



I 

It is possible to interpret the weight function w as a prescription for how to update a 
prior probability on the hypotheses into a posterior probability on those hypotheses, after 
having considered the observations made [Halpern and Fagin 1992]. There is a precise sense 
in which wg can be viewed as a function that maps a prior probability /j,q on the hypotheses 
H to a posterior probability [i \, based on observing ob, by applying Dempster's Rule of 
Combination [Shafer 1976]. That is, 

Mofc = A*o © wg(ob,-), (2) 
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where © combines two probability distributions on H to get a new probability distribution 
on H as follows: 

(Strictly speaking, © is defined for set functions, that is, functions with domain 2 n . We 
have defined ws(ob, •) as a function with domain Ti, but is is clear from (3) that this is all 
that is really necessary to compute © ws(ob, •) in our case.) Note that (3) is not defined 
if X^heW A t i(^ 1 )A i 2(M = — this means that the update (2) is not defined when the weight 
of evidence provided by observation ob all goes for an hypothesis h with prior probability 
Ho(h) = 

Bayes' Rule is the standard way of updating a prior probability based on an observation, 
but it is only applicable when we have a joint probability distribution on both the hypotheses 
and the observations, something which we did not assume we had. Dempster's Rule of 
Combination essentially "simulates" the effects of Bayes's rule. The relationship between 
Dempster's Rule and Bayes' Rule is made precise by the following well-known theorem. 

Proposition 2.2 [Halpern and Fagin 1992] Let £ = (H, O, fx) be an evidence space. Sup- 
pose that P is a probability onTixO such that P(Ti x {o6}[{/i} x O) = fXh(ob) for all h G Ti 
and all ob G O. Let hq be the probability on Ti induced by marginalizing P; that is, Ho(h) = 
P{{h} x O). For ob G O, let n ob = fi © w £ (ob, •)• Then fi ob {h) = P({h} x 0\H x {ob}). 

In other words, when we do have a joint probability on the hypotheses and observations, then 
Dempster's Rule of Combination gives us the same result as a straightforward application 
of Bayes' Rule. 



3 Evidence with Uncertain Likelihoods 

In Example 1.1, each of the two hypotheses A and B determines a likelihood function. 
However, in Example 1.2, the hypothesis A does not determine a likelihood function. By 
viewing it as the compound hypothesis {.Ai,^}, as we did in the introduction, we can 
construct an evidence space with a set {A\, A%, B} of hypotheses. We then get the following 
likelihood mapping fx: 





A l A 2 B 


heads 
tails 


1 3/4 1/2 
1/4 1/2 



Taking 

£ = ({Ai, A2, B}, {heads, tails}, fx), 
we can compute the following weights of evidence: 



we 


Ax A 2 B 


heads 
tails 


4/9 1/3 2/9 
1/3 2/3 
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If we are now given prior probabilities for A\, A2, and B, we can easily use Propo- 
sition 2.2 to compute posterior probabilities for each of these events, and then add the 
posterior probabilities of A\ and A2 to get a posterior probability for A. 

But what if we are given only a prior probability /xq for A and B, and are not given 
probabilities for A\ and A 2 r ! As observed in the introduction, if we define ws(heads, A) = 
w £ (heads, Ai) + wz(heads, A2) = 7/9, and then try to compute the posterior probability of 
A given that heads is observed by naively applying the equation in Proposition 2.2, that is, 
taking by Hheads (A) = (lio®W£ (heads, -))(A), we get an inappropriate answer. In particular, 
the answer is not the posterior probability in general. 

To make this concrete, suppose that /xo(A) = .01. Then, as observed in the introduction, 
a naive application of this equation suggests that the posterior probability of A is 7/205. 
But suppose that in fact llq(A\) = a for some a G [0, .01]. Then applying Proposition 2.2, 
we see that fi h eads(Ai) = a(4/9)/(a(4/9) + (.01 - a)(l/3) + .99(2/9) = 4a/ (a + 2.01). It is 
easy to check that 4a/(a + 2.01) = 7/205 iff a = 1407/81300. That is, the naive application 
of the equation in Proposition 2.2 is correct only if we assume a particular (not terribly 
reasonable) value for the prior probability of A\ . 

We now present one approach to dealing with the problem, and argue that it is reason- 
able. 

Define a generalized evidence space to be a tuple Q = (TL, O, A), where A is a finite set 
of likelihood mappings. As we did in Section 2, we assume that every fx £ A makes every 
observation possible: for all fx G A and all observations ob, there is an hypothesis h such 
that fx(h)(ob) > 0. Note for future reference that we can associate with the generalized 
evidence space Q = (TL, O, A) the set S(Q) = {(Tt, 0,fx) \ fx G A} of evidence spaces. Thus, 
given a generalized evidence space Q, we can define the generalized weight of evidence wg 
to be the set {ws : £ G <S(G)} of weights of evidence. We often treat wg as a set- valued 
function, writing wg(ob,h) for {w(ob,h) \ w G wg}. 

Just as we can combine a prior with the weight of evidence to get a posterior in a 
standard evidence spaces, given a generalized evidence space, we can combine a prior with 
a generalized weight of evidence to get a set of posteriors. Given a prior probability ^0 
on a set H of hypotheses and a generalized weight of evidence wg, let V^^b be the set of 
posterior probabilities on Ti corresponding to an observation ob and prior /xq, computed 
according to Proposition 2.2: 

^0,06 = {Mo © w(ob, ■) I w G wg, fx © w(ob, •) defined}. (4) 

Since fxo ®w(ob, ■) need not always exist for a given w G wg, the set V^ u0 b is made up only 
of those no © w(ob, ■) that do exist. 

Example 3.1 The generalized evidence space for Example 1.2, where Alice's coin is un- 
known, is 

Q = ({A, B}, {heads, tails}, {fx 1 ,fx 2 }), 

where fXi(A) = fXAn A*2(^) = t JL A 2 i an d fJ-i(B) = fi2(B) = lib- Thus, the first likelihood 
mapping corresponds to Alice's coin being double-headed, and the second corresponds to 
Alice's coin being biased 3/4 towards heads. Then wg = {^1,^2}, where wi (heads, A) = 
2/3 and w 2 (heads,A) = 3/5. Thus, if li (A) = a, then V tM)M ads(A) = ^}. I 
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We have now given two approaches for capturing the situation in Example 1.2. The 
first involves refining the set of hypotheses — that is, replacing the hypothesis A by A\ and 
A2 — and using a standard evidence space. The second involves using a generalized evidence 
space. How do they compare? 

To make this precise, we need to first define what a refinement is. We say that the 
evidence space (H',0, fx') refines, or is a refinement of, the generalized evidence space 
{TL, O, A) via g if g : TL' — > TL is a surjection such that fx G A if and only if, for all h G TL, 
there exists some h! G g~ x {K) such that /x(/i) = fx'{h'). For example, the evidence space 
£ at the beginning of this section (corresponding to Example 1.2) is a refinement of the 
generalized evidence space Q in Example 3.1 via the surjection g : {A\, A2, B} — > {A, B} 
that maps and A2 to ^4 and B to B. 

It is almost immediate from the definition of refinement that {TL' , O, /x') refines {TL, O, A) 
only if A has a particularly simple structure. 

Proposition 3.2 If {TL',0,fx') refines (TL,0,A) via g, then A = Y[hen^ >h > waere = 
{yUh!) I h'€g- l {h)}. 

Intuitively, each hypothesis h G TL is refined to the set of hypotheses g~ l {h) C H'; moreover, 
each likelihood function fx{h) in a likelihood mapping // G A is the likelihood function fx'{h') 
for some hypothesis h! refining h. 

A prior /j,' on TL' extends a prior ^0 on H- if f° r ah 

ti(g-Hh)) = m>(h). 

Let Ext{n$) consist of all priors on TL' that extend /io- Recall that, given a set of 
probability measures, the lower probability V*{U) of a set U is inf {//([/) | G P} and its 
upper probability V*{U) is sup{/i(C7) | fx G T 7 } [Halpern 2003]. 

Proposition 3.3 Let £ = {TL' , O, fx') be a refinement of the generalized evidence space 
Q = {TL, O, A) via g. For all ob G O and all h G TL, we have 

{V^,ob)*{h) = {ix' ®w £ {ob,-) I /x' G Ext{fx G )Y{g- l {h)) 

and 

{V^,ob)*{h) = {ix' ®w £ {ob,-) I n' G Ext{^ Q )}*{g- l {h)). 

In other words, if we consider the sets of posteriors obtained by either (1) updating 
a prior probability [xq by the generalized weight of evidence of an observation in Q or (2) 
updating the set of priors extending (Xq by the weight of evidence of the same observation 
in £, the bounds on those two sets are the same. Therefore, this proposition shows that, 
given a generalized evidence space Q, if there an evidence space £ that refines it, then the 
weight of evidence wg gives us essentially the same information as W£. But is there always 
an evidence space £ that refines a generalized evidence space? That is, can we always 
understand a generalized weight of evidence in terms of a refinement? As we now show, we 
cannot always do this. 

Let Q be a generalized evidence space {TL, O, A). Note that if £ refines Q then, roughly 
speaking, the likelihood mappings in A consist of all possible ways of combining the likeli- 
hood functions corresponding to the hypotheses in TL. We now formalize this property. A 
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set A of likelihood mappings is uncorrelated if there exist sets of probability measures Vh 
for each h € H such that 



A = Y[ V h = {fJ, | n(h) € V h for all h G TL}. 



hen 



(We say A is correlated if it is not uncorrelated.) A generalized evidence space (TC, O, A) 
is uncorrelated if A is uncorrelated. 

Observe that if {Ti 1 ', O, /x') refines (Tt, O, A) via g, then (Tt, O, A) is uncorrelated since, 
by Proposition 3.2, A = \[ heH V h , where V h = {n'(h!) \ h' € g~ l (h)}. Not only is 
every refinement uncorrelated, but every uncorrelated evidence space can be viewed as a 
refinement. 

Proposition 3.4 Let Q be a generalized evidence space. There exists an evidence space £ 
that refines Q if and only if Q is uncorrelated. 

Thus, if a situation can be modeled using an uncorrelated generalized evidence space, then 
it can also be modeled by refining the set of hypotheses and using a simple evidence space. 
The uncorrelated case has a further advantage. It leads to simple formula for calculating 
the posterior in the special case that there are only two hypotheses (which is the case that 
has been considered most often in the literature, often to the exclusion of other cases). 

Given a generalized evidence space Q = (Tt, O, A) and the corresponding generalized 
weight of evidence wg, we can define upper and lower weights of evidence, determined by 
the maximum and minimum values in the range, somewhat analogous to the notions of 
upper and lower probability. Define the upper weight of evidence function wg by taking 



These upper and lower weights of evidence can be used to compute the bounds on the 
posteriors obtained by updating a prior probability via the generalized weight of evidence 
of an observation, in the case where Q is uncorrelated, and when there are two hypotheses. 

Proposition 3.5 Let Q = (H, O, A) be an uncorrelated generalized evidence space. 

(a) The following inequalities hold when the denominators are nonzero: 



wg(ob, h) = sup{w(ob, h)\w e wg}. 



Similarly, define the lower weight of evidence function Wg by taking 



w g (ob,h) 



inf{w(ob, h)\w € wg}- 



(v^ ob y(h) < - 




wg(ob, h)(M)(h) 



(5) 



uig(ob,h)no(h) + wg(ob,h')fio(h')' 



(6) 



If \H\ = 2, these inequalities can be taken to be equalities. 
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(b) The following equalities hold: 

Wg(ob, h) 

w g {ob,h) 



(PhT(ob) 



(Vh)*(ob)+ £ (Ph'UobY 

(VhUob) 
(Vh)*(ob)+ £ (W(obV 



where Vh = {/^(h) | /i S A} ; for all h EH. 

Thus, if have an uncorrelated generalized evidence space with two hypotheses, we can 
compute the bounds on the posteriors V^^b in terms of upper and lower weights of evidence 
using Proposition 3.5(a), which consists of equalities in that case. Moreover, we can compute 
the upper and lower weights of evidence using Proposition 3.5(b). As we now show, the 
inequalities in Proposition 3.5(a) can be strict if there are more than two hypotheses. 

Example 3.6 Let H = {D,E,F} and O = {X, Y}, and consider the two probability 
measures n\ and H2, where jJ.\{X) = 1/3 and /X2PO = 2/3. Let Q = (H,0,A), where 
A = {/x I /J,(h) G {/Lii,/^}}- Clearly, A is uncorrelated. Let ^0 be the uniform prior on H, 
so that Ho(D) = [io(E) = Ho(F) = 1/3. Using Proposition 3.5(b), we can compute that the 
upper and lower weights of evidence are as given in the following tables: 



Wg 


D 


E 


F 


X 


1/2 


1/2 


1/2 


Y 


1/2 


1/2 


1/2 



wg 


D 


E 


F 


X 


1/5 


1/5 


1/5 


Y 


1/5 


1/5 


1/5 



The uniform measure is the identity for ©, and therefore [io (&w(ob, •) = w(ob, •). It follows 
that V^x = {w(X, •) I w G wg}. Hence, (P m ,x)*{D) = 1/2 and (V^x)*{D) = 1/5. 
But the right-hand sides of (5) and (6) are 5/9 and 1/6, respectively, and similarly for 
hypotheses E and F. Thus, in this case, the inequalities in Proposition 3.5(a) are strict. I 

While uncorrelated generalized evidence spaces are certainly of interest, correlated 
spaces arise in natural settings. To see this, first consider the following somewhat con- 
trived example. 

Example 3.7 Consider the following variant of Example 1.2. Alice has two coins, one that 
is double-headed and one that is biased 3/4 towards heads, and chooses which one to give 
Zoe. Bob also has two coins, one that is fair and one that is biased 2/3 towards tails, and 
chooses which one to give Zoe. Zoe chooses one of the two coins she was given and tosses 
it. The hypotheses are {A,B} and the observations are {heads, tails}, as in Example 1.2. 
The likelihood function fi\ for Alice's double-headed coin is given by fii(heads) = 1, while 
the likelihood function ^2 for Alice's biased coin is given by ^(heads) = 3/4. Similarly, the 
likelihood function ^3 for Bob's fair coin is given by fi^(heads) = 1/2, and the likelihood 
function 114 for Bob's biased coin is given by ^(heads) = 1/3. 

If Alice and Bob each make their choice of which coin to give Zoe independently, we can 
use the following generalized evidence space to model the situation: 



Qi = ({A, B}, {heads, tails}, Ai), 
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where 

Ai = {(m,H 3 ), (/Xi,/X 4 ), (/X2,^3), Qu2,/M)}- 

Clearly, Ai is uncor related, since it is equal to {/J.i,[i 2 } x {/i3,//4}. 

On the other hand, suppose that Alice and Bob agree beforehand that either Alice gives 
Zoe her double-headed coin and Bob gives Zoe his fair coin, or Alice gives Zoe her biased 
coin and Bob gives Zoe his biased coin. This situation can be modeled using the following 
generalized evidence space: 

Qi = ({A, B}, {heads, tails}, A 2 ), 

where 

A 2 = {(ni,fJ, 3 ), (fi 2 ,n 4 )}. 
Here, note that A 2 is a correlated set of likelihood mappings. | 

While this example is artificial, the example in the introduction, where the robot's 
sensors could have come from either factory 1 or factory 2, is a perhaps more realistic case 
where correlated evidence spaces arise. The key point here is that these examples show that 
we need to go beyond just refining hypotheses to capture a situation. 

4 Combining Evidence 

An important property of Shafer's [1982] representation of evidence is that it is possible to 
combine the weight of evidence of independent observations to obtain the weight of evidence 
of a sequence of observations. The purpose of this section is to show that our framework 
enjoys a similar property, but, rather unsurprisingly, new subtleties arise due to the presence 
of uncertainty. For simplicity, in this section we concentrate exclusively on combining the 
evidence of a sequence of two observations; the general case follows in a straightforward 
way. 

Recall how combining evidence is handled in Shafer's approach. Let 8 = (TL, O, fx) be 
an evidence space. We define the likelihood functions fih on pairs of observations, by taking 
062)) = ^h(obi)nh(ob 2 ). In other words, the probability of observing a particular 
sequence of observations given h is the product of the probability of making each observation 
in the sequence. Thus, we are implicitly assuming that the observations are independent. 
It is well known (see, for example, [Halpern and Fagin 1992, Theorem 4.3]) that Dempster's 
Rule of Combination can be used to combine evidence; that is, 

if £ ({obi, ob 2 ), •) = ws(obi, •) © we(ob 2 , ■)• 

If we let no be a prior probability on the hypotheses, and ^(061,062) ^ e ^ ne probability on 
the hypotheses after observing 061 and ob 2 , we can verify that 

V(ob u ob 2 ) = Mo © w £ ((obi, ob 2 ), •)• 

Here we are assuming that exactly one hypothesis holds, and it holds each time we make an 
observation. That is, if Zoe picks the double-headed coin, she uses it for both coin tosses. 
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Example 4.1 Recall Example 2.1, where Alice just has a double-headed coin and Bob just 
has a fair coin. Suppose that Zoe, after being given the coins and choosing one of them, 
tosses it twice, and it lands heads both times. It is straightforward to compute that 



W£ 


A B 


(heads, heads) 
(heads, tails) 
(tails, heads) 
(tails, tails) 


4/5 1/5 
1 
1 
1 



Not surprisingly, if either of the observations is tails, the coin cannot be Alice's. In the 
case where the observations are (heads, heads), the evidence for the coin being Alice's (that 
is, double-headed) is greater than if a single heads is observed, since from Example 2.1, 
W£ (heads, A) =2/3. This agrees with our intuition that seeing two heads in a row provides 
more evidence for a coin to be double-headed than if a single heads is observed. I 

How should we combine evidence for a sequence of observations when we have a gener- 
alized evidence space? That depends on how we interpret the assumption that the "same" 
hypothesis holds for each observation. In a generalized evidence space, we have possibly 
many likelihood functions for each hypothesis. The real issue is whether we use the same 
likelihood function each time we evaluate an observation, or whether we can use a different 
likelihood function associated with that hypothesis. The following examples show that this 
distinction can be critical. 

Example 4.2 Consider Example 1.2 again, where Alice has two coins (one double-headed, 
one biased toward heads), and Bob has a fair coin. Alice chooses a coin and gives it to 
Zoe; Bob gives his coin to Zoe. As we observed, there are two likelihood mappings in this 
case, giving rise to the weights of evidence we called w\ and w 2 ; w\ corresponds to Alice's 
coin being double-headed, and w 2 corresponds to the coin being biased 3/4 towards heads. 
Suppose that Zoe tosses the coin twice. Since she is tossing the same coin, it seems most 
appropriate to consider the generalized weight of evidence 

{w | w'((ob 1 ,ob 2 ),-) = w i (ob 1 ,-)®w i (ob 2 ,-),i € {1,2}}. 

On the other hand, suppose Zoe first chooses whether she will always use Alice's or 
Bob's coin. If she chooses Bob, then she obviously uses his coin for both tosses. If she 
chooses Alice, before each toss, she asks Alice for a coin and tosses it; however, she does not 
have to use the same coin of Alice's for each toss. Now the likelihood function associated 
with each observation can change. Thus, the appropriate generalized weight of evidence is 

{w' | w'((obi, ob 2 ), •) = Wi(obi,-) ®w j (ob 2 , G {1,2}}. 

I 

Fundamentally, combining evidence in generalized evidence spaces relies on Dempster's 
rule of combination, just like in Shafer's approach. However, as Example 4.2 shows, the 
exact details depends on our understanding of the experiment. While the first approach 
used in Example 4.2 seems more appropriate in most cases that we could think of, we suspect 
that there will be cases where something like the second approach may be appropriate. 
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5 Conclusion 

In the literature on evidence, it is generally assumed that there is a single likelihood function 
associated with each hypothesis. There are natural examples, however, which violate this 
assumption. While it may appear that a simple step of refining the set of hypotheses allows 
us to use standard techniques, we have shown that this approach can lead to counterintu- 
itive results when evidence is used as a basis for making decisions. To solve this problem, 
we proposed a generalization of a popular approach to representing evidence. This gen- 
eralization behaves correctly under updating, and gives the same bounds on the posterior 
probability as that obtained by refining the set of hypotheses when there is no correlation 
between the various likelihood functions for the hypotheses. As we show, this is the one 
situation where we can identify a generalized evidence space with the space obtained by 
refining the hypotheses. One advantage of our approach is that we can also reason about 
situations where the likelihood functions are correlated, something that cannot be done by 
refining the set of hypotheses. 

We have also looked at how to combine evidence in a generalized evidence space. While 
the basic ideas from standard evidence spaces carry over, that is, the combination is essen- 
tially obtained using Dempster's rule of combination, the exact details of how this combina- 
tion should be performed depend on the specifics of how the likelihood functions change for 
each observation. A more detailed dynamic model would be helpful in understanding the 
combination of evidence in a generalized evidence space setting; we leave this exploration 
for future work. 
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A Proofs 

We first establish some results that are useful for proving Proposition 3.3. The following 
lemma gives an alternate way of updating a prior probability by a weight of evidence. 

Lemma A.l Let 8 = (TC, O, fx) be an evidence space. For all ob and H C TL, 



{Ho®w £ (ob,-))(H) 



Ehew Mh)fi(h)(ob) 
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Proof. By the definition of © and w £ , 

(Ho®w £ (ob,-))(H) 



T,heH W)(h)w £ (ob,h) 

hen 1 

Ehgg IM)(h)n(h)(ob) 
J2hen Vo(h)n(h)(ob)' 



Some notation will simplify the presentation of the other results. Suppose that £ = 
(Ti', O, fx') is a refinement of Q = (Tt,0,A) via g. Given a prior probability /iq on Tt, 
recall that Ext(fio) consists of all priors on H! that extend liq. Let Ext A ((io) be the subset 
of Ext (liq) consisting of all priors // on H such that, for all h G H, there exists some 
h! G g~ x (h) such that n' (h') = no(h). In other words, the probability measures in Ext A (no) 
place all the probability no(g~ l (h)) onto a single hypothesis in <7 _1 (fo). If /x G A, let £ M be 
the evidence space (/% 0, /x). 

Lemma A. 2 Let iio be a prior probability on Ti. 

(a) For every /x G A, there is a fi' G Ext A ((io) such that, for all h G H and ob G O, 
(M®W£„(ob,-))(h) = (fi' ®w £ (ob,-))(g- l (h)). 

(b) For every /i' G Ext A (no), there is a /x G A such that, for all h G H and ob G 0, 
(M©^ M (o6,-))(/i) = (^©^(oft,-))^" 1 ^))- 

Proof. Let be a prior probability on To prove (a), let /x be a likelihood mapping in A. 
By the definition of refinement, there is a function f^-.TL^H' such that G 9^(h) 

and /x(/i) = ii'(fn(h)). (Of course, there can be more than one such function.) Define /Xq 
by taking Li' Q (h') = Ho(h) if h! = for some h, and ii' Q (h') = otherwise. Clearly, 

/Xq G Ext A (po), an d 

(*,0 «*>&,■))(*) = E ^^gSU) ^ Lemma A.1] 



H' th')n'{h')(ob) 
(li' (Bw £ (ob,-))(g- 1 (h)). 



[by definition of /Xq and / M ] 
[adding zero terms] 



The proof of (b) is analogous. Let fi' be a prior probability in Ext A (jio)- Define 
fpi : Ti. — > W so that for all h E Ti., f^' Q (h) is the unique /i' G g~ l (h) such that fio(h) = 
fi' (h'). Again, by the definition of refinement, this means there is a /x G A such that 
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H(h) = fi'(f^(h)). A straightforward computation shows that 



( M 0^>M)W = ^:^ {ob) [by Lemma A.l] 



Mo(/^(fe))M , (/ M ; ) (fe))(ob) 



" E K6 „^(/^(S))m'C/^(K))(«*) [by d6finiti0n ° f ^ ^ ^ 

E h / eg -l W WW r , n 

= — — , _/ =7 adding zero terms 



Proposition 3.2 7/ (H' ,0, ft') refines (H,0,A) via g, then A = Yihen^f 1 ' w ^ ere = 

WWlh'eg-^h)}. 

Proof. Suppose that (H',0,n') refines (Ti, O, A) via 5. For ft, G H, let P h = | ti G 

We show that A = \\ he ^Vh- By the definition of refinement, /x G A if and only 
if, for all h € H, there there exists some h! G g~ x (h) such that = fi'(h'), which is the 
case if and only if, for all h £H, 11(h) G Vh, that is, fj, G TYheH^h- Thus, A = Yihen^h- I 

Proposition 3.3 Let 8 = (Ti.', O, //) 6e a refinement of the generalized evidence space 
Q = (Ti, O, A) via g. For all oh GO and all h G Ti, we have 

(V^,obT(h) = {n' (Bw £ (ob,-) I n' G Ext(^)Y(g-\h)) 

and 

(V^ob)*(h) = {// w £ (ob, ■) I /x G Ext(no)}*(9^(h)). 

Proof. We prove the first equality; the second follows by a similar argument. First, we 
prove that (V^^bTW < {fi' ® we(ob, ■) | fi' G Ext(fi )}*(g~ 1 (h)). This follows almost 
immediately from Lemma A. 2, which says that for all w G wg, there is a measure fi' G 
Ext A (no) C Ext (no) such that 

(wew(oi,'))W = (^o©W£(o&,-))(5 _1 (M) 

< K©» £ (oV) I Mo G Ext^Yig-^h)). 

Since w £ wg was chosen arbitrarily, by the properties of sup, we have 

(V^ h ob)*(h) = sup{(/x io(o&, I w G u^} 

< K © ws(ob, ■) 1 // g /^(^KGr 1 (/*)), 

as required. 

To prove the reverse inequality, it suffices to show that for every ob and h, and for every 
n' Q G Ext (no), there is a measure G Ext A (no) such that 

(// e w £ (ob, ■Mg-Hh)) < (n'o' © w £ (ob, ■Mg-Hh)). (7) 
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To prove (7), we first define a function f^i :H^H' such that f^h) is an hypothesis 
in g~ l (h) that maximizes fi'(h")(ob) over all h" G and f^ () (h) for /j ^ h is an 

hypothesis in g~ l (h) that minimizes fi' (h")(ob) over all /i" € g~ l (h). 

Define /ig(/i') as follows: 



Mo(# if ^' = U' Q ( h ) for some ^ 

otherwise. 



Clearly, /Xq is in Ext A (^o). We now show that (7) holds. There are two cases. 
If £h' efl -i(ft)/4(W(/O(°&) = 0, then 

= ^^ff } ^ Lemma A.l] 







< (/Xq © W£(ob, -))(g 1 (h)) [since /Xq © u?£ is a probability]. 



Thus, (7) holds in this case. 

VZ h , eg - Hh) t4>W(h')(ob)>0, then 



(^(Bwsiob,-))^- 1 ^)) 



~ Efc'ew'MoC'')f*'(ft')(o6) 

£ fe / g H'- g -l( fe )/'0( fe ')^( h7 )W 



[by Lemma A.l] 



i+- 



i+- 



1+- 



1+- 



1 



E h / es -l (h) ME ) (h')M'(/ M ,(h))(o6) 

to-w"' l/ (1 i l (I) » (, * )E ^- 1 (D* l, ' ) 



[by definition of /, 



1+- 



■hen-{h} 



^ U^m^^T^W [by definition of ^'] 

[since ^0 = if ^V/u'W] 



M'(/^W)(o»)E h , €fl -l (h) /*&'(*') 



E h / e9 -l (h) Mf,'(h')M'(h')(»») 



1+ 



S h 'e 9 -i(h) M(,'(h')M'(h')(°») 

_ E h / e8 -i w i4(h'W(h>)(ob) 

= (^®w £ (ob,-))(g- 1 (h)). 

Therefore, (7) holds in this case as well. 

Now by Lemma A. 2, corresponding to this //q, there exists some /i <G A such that 
(tH}®v>e»(ob,-))(h) = (^®w £ {ob,-))(g- 1 {h)). Thus, 

{f^eweiob,-))^- 1 ^)) < {^®w E {ob,-)){g-\h)) 

= (po®we^ob,-))(h) 
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Since jJ G was chosen arbitrarily, by the properties of sup, we have 

{n' ®w £ (ob,-) | fi' G Ext(ix )Y(g-\h)) < (7%„ ob )*(/i), 

as required. I 

Proposition 3.4 Let Q be a generalized evidence space. There exists an evidence space £ 
that refines Q if and only if Q is uncorrelated. 

Proof. The forward direction is exactly Proposition 3.2. For the converse, suppose that 
Q = (H,0, A) is uncorrelated, so that A = Ylhen^hi ^ OT some se ts "Ph- Let H' = {(h,fi) \ 
h G H, n G Vh}, define fi' by taking /j,' ((h, //)) = /j,, and set £ = {Ti! ', O , fi') . We show 
that £ refines Q via g, where g((h,fi)) = h. Since A = Ylhen^h, if G A, then for every 
fi(h) G Vh, there is an h! G <7 _1 (/i) (namely, h! = (h,/j,(h))) such that /J,(h) = n'(h'), by 
definition of fi'. Conversely, if /i is such that for all h G Ti, there exists some h! G g~ l (h) 
such that = f^'(h'). Then /i' = (h, fi) for some |U G Vh, and thus G Vh, and 

/i G YlheH^ 11 = ^- This P roves that £ refines Q. I 



Proposition 3.5 Let Q = (Ti., O, A) be an uncorrelated generalized evidence space, 
(a) The following inequalities hold when the denominators are nonzero: 

wg(ob, h)no(h) 



(V, , ob )*(h) < 



w g (ob,h)no(h) + wg(ob,h')no(h') 

h'^h 



(V^,ob)*(h) > - - 



wg(ob,h)fi (h) + wg(ob,h')no(h')' 



If \Ti\ =2, these inequalities can be taken to be equalities, 
(b) The following equalities hold: 

{v h T(ob) 



wg(ob, h) 



(V h )*(ob)+ £ (7V)*W 

( h M W*^) 

^ (oM)= ^)rr(^' 

w/iere P/i = | fJ- G A} ; /or all h eTi. 

Proof. For part (a), we just prove the first inequality; the second follows by a symmetric 
argument. Assume that wg(ob,h)fio(h) + X^'^ — e(°^> h')/j,o(h') > 0. It is clearly suf- 
ficient to show that for all w G wg, (/Xo © w(ob, -))(h) < (wg(ob, h)(io)/(wg(ob, h)fj,o + 
Ylfh'^h^- gi°^i h'))- The desired inequality then follows by properties of sup. 
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Given w G wg, by definition of wg and Wg, w(ob,h) < wg(ob,h) and w.g(ob,h) < 
w(ob, h), for all h G 7i and ob G O. Thus, 

^ Ho(h)iJ,o(h')w(ob,h)wg(ob,h') < ^ fi (h)fj, (h')wg(ob, h)w(ob, h'), 
h'^h h'^h 



so 



IJ>o(h)no(h)w(ob, h)wg(ob, h) + fj,o(h)/j,o(h')w(ob,h)wg(ob,h') < 

h'+h 

[io(h)no(h)w(ob,h)wg(ob,h) + fj,o(h)/j,o(h')wg(ob, h)w(ob, h'). 

h'+h 

It easily follows that 

fio(h)w(ob, h) 



(jio®w(ob,-))(h) 



< 



fi (h)w(ob, h) + no(h')w(ob, h>) 

fio(h)Wg(ob, h) 



/J.o{h)wg(ob,h) + Y,h'^h^{h')w g {ob,h') 



as required. 

If [W| = 2, we show that the inequality can be strengthened into an equality. Assume 
H = {h\, h,2\- Without loss of generality, it suffices to show that 

K ^ h ° b> K V ~ fio(hi)wg(ob, h) + fio(h 2 )w g (ob, h 2 ) ' W 

The key step in the argument is establishing that for w G wg, if wg(ob,hi) = w(ob,hi) 
then w.g(ob,li2) = w{ob,h2)- We know that for every fixed ob and every w, w(ob,hi) = 
1 — w(ob, h2)- If w(ob, h\) = Wg(ob, hi) and w(ob, /12) > Wg(ob, h 2 ), then there must exist 
w' G wg with w(ob,h2) > w'{ob,h2); but then we must have w(ob,hi) = 1 — w(ob,h2) < 
1 — w'(ob,h2) = w'(ob,hi), contradicting the fact that w(ob,hi) = Wg(ob,hi). Thus, 
w(ob,h2) < Wg(ob,h2), so that w{ob,h2) = Wg(ob,h2)- To prove (8), we now proceed as 
follows. Let w £ wg be such that w(ob,hi) = wg{ob, hi). (We know such a w exists since 
wg is finite.) We now get that 

{V^obfihx) > (fio®w(ob,-))(hi) 

Ho(hi)w(ob, hi) 



Ho(hi)w(ob, hi) + fi (h 2 )w(ob, h 2 ) 

_ fio(hi)wg(ob, hi) 

Ho(hi)wg(ob, hi) + (J,o(h 2 )w g (ob, h 2 ) ' 

as required. 

For part (b), we again prove only the first equality; the second again follows by a 
symmetric argument. We first show that (Vh)*(°b) + J2h'^h(Ph)*(°b) > 0, to establish 
that the right-hand side is well defined. By way of contradiction, assume that (Vh)*(ob) + 
J2h'^h('^'h)*(ob) = 0. Since {Vh)*{ob) = 0, we have [ih(ob) = for all ^ G Vh', similarly, 
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for every h! / h, since (Ph')*(ob) = and V' h is finite, there exists fih' G Vw such that 
/j,h'(ob) = 0. Because A is uncorrelated, we can find a /x G A such that fj,(h')(ob) = 
for every b! G 7i, contradicting the assumption we made that A contains only likelihood 
mappings that make every observation possible. 

Since Q = (H, O, A) is uncorrelated, A = FLeW ^ h ^ or some se ^ s ^h- Thus, there exists 
a fi G A such that n{h){ob) is (P h )*(ob) and /j,(h')(ob) is {V hl )^{ob) when ti / h. (The 
bounds are attained because each Vh is finite.) Since £ = (H,0,fj.) G S(Q), we have 

W£{ob, h) - 



J2 h >eH»( h ')( ob ) 

{Vh)*{ob) 

(r h )*(ob) + j: h ^ h (v h Uob) 

and thus 

wg{ob, h) > — - - i } Vh ^° b \^ — — . (9) 
" (W(^) + £^(^Mo&) 

To prove equality, it suffices to show that (V h )*{ob) /((V h )*{ob) + J2h>^h(Ph)*(ob)) > 
w(ob,h) for all w G So choose w G and let the corresponding evidence space be 
£ = (H, O, /x) in S(Q). Given h G H and ob G O, there are two cases. If fi(h)(ob) = 0, then 

w(ob, h) = < 



(r h )*(ob) + z h ^ h (r h Uoby 

If fi{h){ob) > 0, then {V h )*{ob) > ^{h){ob) > 0, so 

tx{h){ob) 



w(ob, h) = 



J2h'en^( h ')( ob ) 
1 



1 + 

< 



/*(/>)(<>&) 

1 



i+ (V h )*(ob) 



CPhT(ob) 



(r h y(ob) + z h ^ h (r h Uoby 

Since u; was arbitrary, by the properties of sup, 



w g (ob,h) < — . . rr^ffi^; r^rr - (10) 



Equations (9) and (10) together give the result. I 
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